G Uide a Ctor - C Ritic for C Ontinuous C Ontrol
نویسندگان
چکیده
Actor-critic methods solve reinforcement learning problems by updating a parameterized policy known as an actor in a direction that increases an estimate of the expected return known as a critic. However, existing actor-critic methods only use values or gradients of the critic to update the policy parameter. In this paper, we propose a novel actor-critic method called the guide actor-critic (GAC). GAC firstly learns a guide actor that locally maximizes the critic and then it updates the policy parameter based on the guide actor by supervised learning. Our main theoretical contributions are two folds. First, we show that GAC updates the guide actor by performing second-order optimization in the action space where the curvature matrix is based on the Hessians of the critic. Second, we show that the deterministic policy gradient method is a special case of GAC when the Hessians are ignored. Through experiments, we show that our method is a promising reinforcement learning method for continuous controls.
منابع مشابه
Einforcement L Earning through a Syn - Chronous a Dvantage a Ctor - C Ritic on a Gpu
We introduce a hybrid CPU/GPU version of the Asynchronous Advantage ActorCritic (A3C) algorithm, currently the state-of-the-art method in reinforcement learning for various gaming tasks. We analyze its computational traits and concentrate on aspects critical to leveraging the GPU’s computational power. We introduce a system of queues and a dynamic scheduling strategy, potentially helpful for ot...
متن کاملA Direct Control Method For a Class of Nonlinear Systems Using Neural Networks
g ihGpEsxpixqGFTS gmridge niversity ingineering heprtment rumpington treet gmridge gfP I inglnd wrh IWWI e diret ontrol sheme for lss of ontinuous time nonliner systems using neuE rl networks is presentedF he ojetive of ontrol is to trk desired referene signlF his ojetive is hieved through inputGoutput lineriztion of the system with neurl networksF he...
متن کاملImproving Communication of Critical Domain Knowledge in High-Consequence Software Development: An Empirical Study
K. S. H a n k s ; U n iv e rs ity o f V irg in ia ; C h a rlo tte s v ille , V irg in ia J. C. K n ig h t; U n iv e rs ity o f V irg in ia ; C h a rlo tte s v ille , V irg in ia K e y w o rd s : re q u ire m e n ts , n a tu ra l la n g u a g e , s a fe ty-c ritic a l A b s tra c t P o o r re q u ire m e n ts a re im p lic a te d in a d is p ro p o rtio n a te n u m b e r o f d e fe c ts in s a ...
متن کاملÄä Blockinøùöö Aeóøø× Ò Óñôùøøö Ë Blockin Blockin Blockinò Blockin
1 Nonlinear stabilization by hybrid quantized feedba k Daniel Liberzon Dept. of Ele t. Eng., Yale University New Haven, CT 06520-8267 U.S.A. daniel.liberzon yale.edu Abstra t. This paper is on erned with global asymptoti stabilization of ontinuous-time ontrol systems by means of quantized feedba k. For linear systems, a hybrid ontrol strategy for dealing with this problem was re ently proposed ...
متن کاملG-frames and their duals for Hilbert C*-modules
Abstract. Certain facts about frames and generalized frames (g- frames) are extended for the g-frames for Hilbert C*-modules. It is shown that g-frames for Hilbert C*-modules share several useful properties with those for Hilbert spaces. The paper also character- izes the operators which preserve the class of g-frames for Hilbert C*-modules. Moreover, a necessary and suffcient condition is ob- ...
متن کامل